To Improve the Robustness of LSTM-RNN Acoustic Models Using Higher-Order Feedback from Multiple Histories

نویسندگان

Hengguan Huang

Brian Kan-Wing Mak

چکیده

This paper investigates a novel multiple-history long short-term memory (MH-LSTM) RNN acoustic model to mitigate the robustness problem of noisy outputs in the form of mis-labeled data and/or mis-alignments. Conceptually, after an RNN is unfolded in time, the hidden units in each layer are re-arranged into ordered sub-layers with a master sub-layer on top and a set of auxiliary sub-layers below it. Only the master sub-layer generates outputs for the next layer whereas the auxiliary sublayers run in parallel with the master sub-layer but with increasing time lags. Each sub-layer also receives higher-order feedback from a fixed number of sub-layers below it. As a result, each sub-layer maintains a different history of the input speech, and the ensemble of all the different histories lends itself to the model’s robustness. The higher-order connections not only provide shorter feedback paths for error signals to propagate to the farther preceding hidden states to better model the long-term memory, but also more feedback paths to each model parameter and smooth its update during training. Phoneme recognition results on both real TIMIT data as well as synthetic TIMIT data with noisy labels or alignments show that the new model outperforms the conventional LSTM RNN model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast and accurate recurrent neural network acoustic models for speech recognition

We have recently shown that deep Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform feed forward deep neural networks (DNNs) as acoustic models for speech recognition. More recently, we have shown that the performance of sequence trained context dependent (CD) hidden Markov model (HMM) acoustic models using such LSTM RNNs can be equaled by sequence trained phone models in...

متن کامل

Long short-term memory recurrent neural network architectures for large scale acoustic modeling

Long Short-Term Memory (LSTM) is a specific recurrent neural network (RNN) architecture that was designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we explore LSTM RNN architectures for large scale acoustic modeling in speech recognition. We recently showed that LSTM RNNs are more effective than DNNs and conventional RNN...

متن کامل

Acoustic Modeling in Statistical Parametric Speech Synthesis – from Hmm to Lstm-rnn

Statistical parametric speech synthesis (SPSS) combines an acoustic model and a vocoder to render speech given a text. Typically decision tree-clustered context-dependent hidden Markov models (HMMs) are employed as the acoustic model, which represent a relationship between linguistic and acoustic features. Recently, artificial neural network-based acoustic models, such as deep neural networks, ...

متن کامل

High Order Recurrent Neural Networks for Acoustic Modelling

Vanishing long-term gradients are a major issue in training standard recurrent neural networks (RNNs), which can be alleviated by long short-term memory (LSTM) models with memory cells. However, the extra parameters associated with the memory cells mean an LSTM layer has four times as many parameters as an RNN with the same hidden vector size. This paper addresses the vanishing gradient problem...

متن کامل

Semi-Supervised Training in Deep Learning Acoustic Model

We studied the semi-supervised training in a fully connected deep neural network (DNN), unfolded recurrent neural network (RNN), and long short-term memory recurrent neural network (LSTM-RNN) with respect to the transcription quality, the importance data sampling, and the training data amount. We found that DNN, unfolded RNN, and LSTM-RNN are increasingly more sensitive to labeling errors. For ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

To Improve the Robustness of LSTM-RNN Acoustic Models Using Higher-Order Feedback from Multiple Histories

نویسندگان

چکیده

منابع مشابه

Fast and accurate recurrent neural network acoustic models for speech recognition

Long short-term memory recurrent neural network architectures for large scale acoustic modeling

Acoustic Modeling in Statistical Parametric Speech Synthesis – from Hmm to Lstm-rnn

High Order Recurrent Neural Networks for Acoustic Modelling

Semi-Supervised Training in Deep Learning Acoustic Model

عنوان ژورنال:

اشتراک گذاری